Guided Policy Search via Approximate Mirror Descent

نویسندگان

  • William H. Montgomery
  • Sergey Levine
چکیده

Guided policy search algorithms can be used to optimize complex nonlinear policies, such as deep neural networks, without directly computing policy gradients in the high-dimensional parameter space. Instead, these methods use supervised learning to train the policy to mimic a “teacher” algorithm, such as a trajectory optimizer or a trajectory-centric reinforcement learning method. Guided policy search methods provide asymptotic local convergence guarantees by construction, but it is not clear how much the policy improves within a small, finite number of iterations. We show that guided policy search algorithms can be interpreted as an approximate variant of mirror descent, where the projection onto the constraint manifold is not exact. We derive a new guided policy search algorithm that is simpler and provides appealing improvement and convergence guarantees in simplified convex and linear settings, and show that in the more general nonlinear setting, the error in the projection step can be bounded. We provide empirical results on several simulated robotic navigation and manipulation tasks that show that our method is stable and achieves similar or better performance when compared to prior guided policy search methods, with a simpler formulation and fewer hyperparameters.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Guided Policy Search as Approximate Mirror Descent

Guided policy search algorithms can be used to optimize complex nonlinear policies, such as deep neural networks, without directly computing policy gradients in the high-dimensional parameter space. Instead, these methods use supervised learning to train the policy to mimic a “teacher” algorithm, such as a trajectory optimizer or a trajectory-centric reinforcement learning method. Guided policy...

متن کامل

A Free Line Search Steepest Descent Method for Solving Unconstrained Optimization Problems

In this paper, we solve unconstrained optimization problem using a free line search steepest descent method. First, we propose a double parameter scaled quasi Newton formula for calculating an approximation of the Hessian matrix. The approximation obtained from this formula is a positive definite matrix that is satisfied in the standard secant relation. We also show that the largest eigen value...

متن کامل

Provable Bayesian Inference via Particle Mirror Descent

Since the prox-mapping of stochastic mirror descent is intractable when directly being applied to the optimization problem (1), we propose the -inexact prox-mapping within the stochastic mirror descent framework in Section 3. Instead of solving the prox-mapping exactly, we approximate the solution with error. In this section, we will show as long as the approximation error is tolerate, the stoc...

متن کامل

Mirror Descent Search and Acceleration

In recent years, attention has been focused on the relationship between black box optimization and reinforcement learning. Black box optimization is a framework for the problem of finding the input that optimizes the output represented by an unknown function. Reinforcement learning, by contrast, is a framework for finding a policy to optimize the expected cumulative reward from trial and error....

متن کامل

Scalable Bayesian Inference via Particle Mirror Descent

Bayesian methods are appealing in their flexibility in modeling complex data and ability in capturing uncertainty in parameters. However, when Bayes’ rule does not result in tractable closed-form, most approximate inference algorithms lack either scalability or rigorous guarantees. To tackle this challenge, we propose a simple yet provable algorithm, Particle Mirror Descent (PMD), to iterativel...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016